Skip to content

fix(bootstrap): surface Helm install failure on namespace timeout (#211)#486

Open
Manoj-engineer wants to merge 1 commit intoNVIDIA:mainfrom
Manoj-engineer:fix/helm-error-diagnosis-211
Open

fix(bootstrap): surface Helm install failure on namespace timeout (#211)#486
Manoj-engineer wants to merge 1 commit intoNVIDIA:mainfrom
Manoj-engineer:fix/helm-error-diagnosis-211

Conversation

@Manoj-engineer
Copy link

Vouched at: #420

Summary

when gateway start times out waiting for the openshell namespace, the error
message now checks for failed helm-install-* jobs in kube-system and surfaces
the actual Helm error and last 30 log lines instead of the generic "namespace not ready" message.

Related Issue

Fixes #211

Changes

  • Add diagnose_helm_failure() in openshell-bootstrap/src/lib.rs that queries
    helm-install-* jobs in kube-system for failed pods and returns job conditions
    • last 30 log lines
  • Wire into wait_for_namespace() final timeout branch
  • Fix awk filter: status.failed stays <none> during backoff retry window;
    filter on != "0" instead of != "<none>" && != "0" to catch actively-failing jobs
  • Add unit test helm_failure_hint_is_included_in_namespace_timeout_message

Testing

  • mise run pre-commit passes
  • Unit tests added/updated — 78/78 pass
  • E2E tests added/updated (not applicable — error path only)
  • Live-tested end-to-end: built a gateway image with corrupted serviceaccount.yaml,
    confirmed the Helm error appears in the terminal output on timeout

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (not applicable)

…IDIA#211)

Signed-off-by: Manoj-engineer <194872717+Manoj-engineer@users.noreply.github.com>
@Manoj-engineer Manoj-engineer requested a review from a team as a code owner March 19, 2026 21:42
@drew drew self-assigned this Mar 20, 2026
@drew drew added the test:e2e Requires end-to-end coverage label Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve error message when Helm chart has malformed YAML

2 participants